AITopics | marginal effect

Collaborating Authors

marginal effect

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

12ced2db6f0193dda91ba86224ea1cd8-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 13:36:19 GMT

hyperparameter, hyperparameter space, pdp, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Towards a Comprehensive Scaling Law of Mixture-of-Experts

Zhao, Guoliang, Fu, Yuhan, Li, Shuaipeng, Sun, Xingwu, Xie, Ruobing, Wang, An, Han, Weidong, Yang, Zhen, Sun, Weixuan, Zhang, Yudong, Xu, Cheng-zhong, Wang, Di, Jiang, Jie

arXiv.org Artificial IntelligenceSep-30-2025

Mixture-of-Experts (MoE) models have become the consensus approach for enabling parameter-efficient scaling and cost-effective deployment in large language models. However, existing scaling laws for dense models are inapplicable to MoE models, which stems from three critical challenges: the multiplicity of influencing factors, their intricate coupling relationships and the non-monotonic nature of their performance impacts. Specifically, we design 446 controlled experiments to characterize their marginal effects, ultimately constructing a comprehensive and precise joint MoE scaling law that considers all essential factors. Our results demonstrate that the optimal settings for G and S are independent of both the model architecture and data size. Our proposed MoE scaling law could function as an accurate and insightful guidance to facilitate future MoE model design and training. Large language models (LLMs) have been widely verified and utilized in our daily lives. It is impressive and lucky to discover that LLMs can continuously expand its ability boundaries with increasing model and training data sizes. The scaling laws of LLMs (Kaplan et al., 2020; Hoffmann et al., 2022; Sun et al., 2025), which could predict the model loss based on crucial factors (e.g., data/model sizes) before training, shed lights on the promising way of wisely selecting appropriate model structures and settings before experiments and continuously enhancing the ability of LLMs under given training budget or environment constraints. Recently, Mixture-of-Experts (MoE) becomes one of the mainstream structures broadly used in powerful industry-level LLMs (Dubey et al., 2024; Liu et al., 2024; Sun et al., 2024; Liu et al., 2025; Qwen Team et al., 2025; OpenAI et al., 2025).

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.23678

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models

Mendelsohn, Julia, Budak, Ceren

arXiv.org Artificial IntelligenceFeb-18-2025

Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. "water" or "vermin"). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.13246

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(16 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Immigration & Customs (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Machine Unlearning via Information Theoretic Regularization

Xu, Shizhou, Strohmer, Thomas

arXiv.org Machine LearningFeb-11-2025

How can we effectively remove or "unlearn" undesirable information, such as specific features or individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a mathematical framework based on information-theoretic regularization to address both feature and data point unlearning. For feature unlearning, we derive a unified solution that simultaneously optimizes diverse learning objectives, including entropy, conditional entropy, KL-divergence, and the energy of conditional probability. For data point unlearning, we first propose a novel definition that serves as a practical condition for unlearning via retraining, is easy to verify, and aligns with the principles of differential privacy from an inference perspective. Then, we provide provable guarantees for our framework on data point unlearning. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2502.05684

Country:

North America > United States > California > Yolo County > Davis (0.14)
Europe (0.14)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Causal Inference Tools for a Better Evaluation of Machine Learning

Soumm, Michaël

arXiv.org Artificial IntelligenceOct-2-2024

We present a comprehensive framework for applying rigorous statistical techniques from econometrics to analyze and improve machine learning systems. We introduce key statistical methods such as Ordinary Least Squares (OLS) regression, Analysis of Variance (ANOVA), and logistic regression, explaining their theoretical foundations and practical applications in machine learning evaluation. The document serves as a guide for researchers and practitioners, detailing how these techniques can provide deeper insights into model behavior, performance, and fairness. We cover the mathematical principles behind each method, discuss their assumptions and limitations, and provide step-by-step instructions for their implementation. The paper also addresses how to interpret results, emphasizing the importance of statistical significance and effect size. Through illustrative examples, we demonstrate how these tools can reveal subtle patterns and interactions in machine learning models that are not apparent from traditional evaluation metrics. By connecting the fields of econometrics and machine learning, this work aims to equip readers with powerful analytical tools for more rigorous and comprehensive evaluation of AI systems. The framework presented here contributes to developing more robust, interpretable, and fair machine learning technologies.

assumption, regression, variance, (17 more...)

arXiv.org Artificial Intelligence

2410.01392

Country:

North America > United States (0.04)
Europe > Slovakia > Bratislava > Bratislava (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Attribution Methods in Asset Pricing: Do They Account for Risk?

Chen, Dangxing, Gao, Yuan

arXiv.org Artificial IntelligenceJul-11-2024

Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Consequently, when applying machine learning models, we must ensure that the attribution methods reflect the underlying risks accurately. In this work, we present and study several axioms derived from asset pricing domain knowledge. It is shown that while Shapley value and Integrated Gradients preserve most axioms, neither can satisfy all axioms. Using extensive analytical and empirical examples, we demonstrate how attribution methods can reflect risks and when they should not be used.

interest rate, stock price, volatility, (16 more...)

arXiv.org Artificial Intelligence

2407.08953

Country:

North America > United States > California (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Policy design in experiments with unknown interference

Viviano, Davide, Rudder, Jess

arXiv.org Artificial IntelligenceDec-28-2023

This paper studies experimental designs for estimation and inference on policies with spillover effects. Units are organized into a finite number of large clusters and interact in unknown ways within each cluster. First, we introduce a single-wave experiment that, by varying the randomization across cluster pairs, estimates the marginal effect of a change in treatment probabilities, taking spillover effects into account. Using the marginal effect, we propose a test for policy optimality. Second, we design a multiple-wave experiment to estimate welfare-maximizing treatment rules. We provide strong theoretical guarantees and an implementation in a large-scale field experiment.

experiment, marginal effect, probability, (16 more...)

arXiv.org Artificial Intelligence

2011.08174

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Indonesia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Food & Agriculture > Agriculture (1.00)
Health & Medicine > Therapeutic Area > Vaccines (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

Add feedback

Which linguistic cues make people fall for fake news? A comparison of cognitive and affective processing

Lutz, Bernhard, Adam, Marc, Feuerriegel, Stefan, Pröllochs, Nicolas, Neumann, Dirk

arXiv.org Artificial IntelligenceDec-2-2023

Fake news on social media has large, negative implications for society. However, little is known about what linguistic cues make people fall for fake news and, hence, how to design effective countermeasures for social media. In this study, we seek to understand which linguistic cues make people fall for fake news. Linguistic cues (e.g., adverbs, personal pronouns, positive emotion words, negative emotion words) are important characteristics of any text and also affect how people process real vs. fake news. Specifically, we compare the role of linguistic cues across both cognitive processing (related to careful thinking) and affective processing (related to unconscious automatic evaluations). To this end, we performed a within-subject experiment where we collected neurophysiological measurements of 42 subjects while these read a sample of 40 real and fake news articles. During our experiment, we measured cognitive processing through eye fixations, and affective processing in situ through heart rate variability. We find that users engage more in cognitive processing for longer fake news articles, while affective processing is more pronounced for fake news written in analytic words. To the best of our knowledge, this is the first work studying the role of linguistic cues in fake news processing. Altogether, our findings have important implications for designing online platforms that encourage users to engage in careful thinking and thus prevent them from falling for fake news.

affective processing, cognitive processing, linguistic cue, (13 more...)

arXiv.org Artificial Intelligence

2312.03751

Country:

Asia > Russia (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.05)
North America > United States > New York (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.47)

Add feedback

fmeffects: An R Package for Forward Marginal Effects

Löwe, Holger, Scholbeck, Christian A., Heumann, Christian, Bischl, Bernd, Casalicchio, Giuseppe

arXiv.org Machine LearningOct-3-2023

Forward marginal effects (FMEs) (Scholbeck et al., 2022) provide simple yet accurate local modelagnostic explanations in terms of forward differences in prediction. They address questions of the form: If we change x by an amount h, what is the change in predicted outcome ŷ? For instance, given a medical study where a model is trained to predict a patient's disease risk, FMEs can tell us each patient's individual change in predicted risk due to losing 5kg in body weight. FMEs thus provide actionable and comprehensible advice for stakeholders, including ones without expertise in machine learning. If the change in predicted risk is substantial enough, doctors may recommend a tailored exercise and nutrition regimen.

artificial intelligence, machine learning, subgroup, (17 more...)

arXiv.org Machine Learning

2310.02008

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback